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Foreword 


This  project  is  developing  a  system  to  compress  data  such  as  imagery  and  speech  to  near¬ 
maximum  levels.  We  are  using  the  Cottrell/Munro/Zipser  neur^  network  technique  to 
implement  a  vector  quantization  method  (which  we  call  data  neutronium  in  analogy  with  solid 
neutronium,  the  densest  possible  form  of  matter)  that  uses  a  mathematically  defined  codebook 
defined  on  the  data  manifold  from  which  the  data  to  be  compressed  is  drawn.  Unlike  fractal 
compression  (another  compression  method  which  may  be  able  to  achieve  near-maximum 
compression,  but  which  is  computationally  impractical),  the  C/M/Z  neural  network  (which  is 
trained  off-line,  once)  is  computationally  simple  and  can  carry  out  both  data  compression  and 
decompression  in  real-time  using  low-cost  hardware.  Our  Phase  I  goal  was  to  demonstrate  a 
50:1  compression  ratio  for  imagery  with  an  RMS  pixel  grey-scale  error  of  less  than  25  (10%  of 
the  256  total  brightness  quantization  levels).  We  exceeded  this  by  demonstrating  64:1 
compression  at  an  RMS  pixel  error  level  of  20.7  grey  scale  levels.  During  Phase  I  we  also 
developed  a  precise  mathematical  definition  of  the  data  neutronium  compression  method.  We 
also  proved,  for  the  case  of  large  compression  problems  (e.g.,  large  image  tiles  or  long  speech 
sample  time  windows)  that  the  mean  squared  error  distortion  of  the  data  neutronium  method  will 
be  no  more  than  43%  greater  than  an  optimal  source  coding  system  with  the  same  number  of 
transmitted  bits.  This  puts  data  neutronium  in  a  category  by  itself,  as  it  is  the  only  potentially 
practical  data  compression  method  that  is  provably  near-optimal. 

1.  Statement  of  the  Problem  Studied 

Data  compression  has  a  long  history.  Almost  60  years  ago  the  first  analog  vocoders  (speech 
signal  compressors)  were  built  (Dudley).  These  systems  took  ordinary  speech  (which  has  a 
bandwidth  of  about  30(K)  Hz  —  200  Hz  to  3200  Hz)  and  reduced  it’s  bandwidth  by  about  a 
factor  of  10.  Modem  vocoders  [O'Shaughnessy]  can  reduce  the  normal  64  kbit/sccond  speech 
data  rate  used  in  telephony  down  to  580  biis/sccond  (a  compression  ratio  of  1 10)  while  retaining 
good  speech  quality.  Unfortunately,  the  hardware  required  to  carry  out  the  associated 
processing  is  much  too  expensive  for  general  use.  Methods  suitable  for  widespread  use  (such 
as  Linear  Predictive  Coding  (O'Shaughnessy))  achieve  compression  ratios  of  little  more  than 
10.  Similar  statements  hold  for  image  compression,  where  the  best  known  technique  (fractal 
compression  (Stark))  can,  at  least  for  many  images,  achieve  compression  ratios  around  100 
(given  unlimited  compression  time),  but  practical  techniques  (such  as  the  JPEG  algorithm)  can 
only  achieve  a  compression  ratio  of  about  10. 

All  data  compression  problems  are  special  cases  of  the  of  the  source  coding  problem  of 
information  theory.  The  general  theory  of  source  coding  was  developed  by  Claude  Shannon 
about  45  years  ago  (McEliece).  Source  coding  theory  provides  upper  bounds  on  the  achievable 
compression  ratio  for  each  type  of  information  source.  It  also  provides  performance  bounds  for 
parucular  coding  schemes.  Typically,  rcscarchcis  in  source  coding  snidy  general  mathematical 
theories  and  not  specific  data  compression  problems.  Researchers  in  data  compression  typically 
study  their  specific  data  compression  problem  without  much  regard  for  source  coding  theory.  It 
would  seem  desirable  to  bring  these  two  research  streams  into  closer  contact,  but  this  has  not 
happened  yet. 

What  is  clear  is  that  there  is  still  great  potential  for  increased  performance  in  data  compression. 
Many  researchers  in  source  coding  and  data  compression  believe  that  both  telephone  speech  and 
television  imagery  will  someday  be  routinely  compressed  at  ratios  exceeding  1  ,(XX).  TTris  is  two 
orders  of  magnitude  beyond  what  can  be  practically  achieved  today.  The  problem  addressed  by 
this  project  is  the  development  of  a  gener^-purpose  practical  data  compression  method  that  can 
push  compression  ratios  at  least  an  order  of  magnitude  beyond  the  best  practical  systems  today; 
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and  do  so  at  low  implementation  cost.  This  method  should  exploit  both  data  compression 
research  insights  as  well  as  source  coding  theory  insights. 

2.  Summary  of  Results 

This  project  is  exploring  a  new  approach  to  data  compression.  In  a  nutshell,  this  new  approach 
exploits  a  generalization  of  principal  component  analysis  to  create  a  grid  of  points  in  the  data 
space.  This  grid  is  then  used  as  a  vector  quantization  codebook.  During  Phase  1  we 
demonstrated  that  the  mean  squared  error  distortion  of  such  a  source  coding  system  would 
asymptotically  (as  the  dimensionality  of  the  data  vectors  and  the  number  of  vector  quantization 
codebook  vectors  go  to  infinity)  approach  1.423...  times  that  of  an  optimal  source  coder  with 
the  same  number  of  codes.  We  ^so  proved  that  the  Cottrell/Munro/Zipser  neural  network 
technique  [Cottrell,  Hecht-Nielsen,  Kramer]  can  be  used  to  approximate  a  data  neutronium 
compression  and  decompression  system  to  any  desired  degree  of  accuracy.  Thus,  the  data 
neutronium  method  has  a  potential  performance  advantage  over  all  other  known  practical  data 
compression  methods  (as  no  other  such  method  has  h^  a  comparable  mathematical  result 
established  for  it). 

Although  theoretical  results  arc  of  great  importance,  for  any  data  compression  method  the  true 
test  is  its  performance  on  real  data.  During  Phase  I  of  this  project  we  developed  and  iteratively 
improved  an  experimental  data  neutronium  compression  system  based  upon 
Cottrell/Munro/Zipser  neural  networks.  The  data  used  for  these  experiments  were  1(X)  video 
images  of  indoor  office  scenes.  These  neural  network  systems  were  each  trained  using  three  of 
the  100  images  (another  three  were  used  as  a  training  test  set  to  monitor  training  progress). 
The  best  systems  (of  which  there  were  three  ail  having  nearly  the  same  performance)  were  then 
tested  using  the  remaining  94  fresh  images.  The  best  performance  achieved  was  a  compression 
ratio  of  64  at  a  pet  pixel  RMS  grey  scale  error  level  of  20.7  (which  thus  exceeded  our  Phase  I 
technical  objective  of  a  compression  ratio  of  50  at  an  RMS  error  level  of  25). 

3.  Publication  List 

Hecht-Nielsen,  R.,  “Data  neutronium:  a  near-optimal  real-time  neural  network  data  compression 
method”,  draft  manuscript  currently  being  privately  circulated  for  comments,  to  be  submitted 
for  publication  in  Spring  1993. 

4.  Scientific  Personnel 

The  following  scientific  personnel  participated  in  Phase  1  of  this  project: 

•  Dr,  Robert  Hecht-Nielsen,  Principal  Investigator 

•  Dr.  Shinmin  Wang,  Project  Scientist 

5.  Inventions 

The  following  invention  was  conceived  in  Phase  I  of  this  project: 

•  Data  Neutronium  Source  (Coding  and  Data  Compression  Process 

A  patent  for  this  invention  is  expected  to  be  filed  by  HNC,  Inc.  later  in  1993. 
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